Methods for identifying and using small rna predictors

ABSTRACT

The invention provides a method for identifying or detecting small RNA (sRNA) predictors of a disease or a condition. The method comprises identifying one or more sRNA sequences that are present in one or more samples of an experimental cohort, and which are not present across a comparator cohort; and optionally identifying one or more sRNA sequences that are present in one or more samples of a comparator cohort, and which are not present across an experimental cohort. In contrast to identifying dysregulated non-coding RNAs (such as miRs that are up- or down-regulated), the invention identifies sRNAs that are binary predictors, that is, present in one cohort (e.g., an experimental cohort) and not another (e.g., a comparator cohort). Further, by quantifying reads for individual sequences (e.g., iso-miRs), without consolidating reads to annotated reference sequences, the invention unlocks the diagnostic utility of miRs and other sRNAs.

PRIORITY

This application claims the benefit of, and priority to, U.S.Provisional Application No. 62/449,275, filed Jan. 23, 2017, thecontents of which are hereby incorporated by reference in its entirety.

BACKGROUND

microRNAs (abbreviated miRNAs or miRs) are small non-coding RNAmolecules (about 22 nucleotides in length) found in plants and animalsthat function in RNA silencing and post-transcriptional regulation ofgene expression. miRNAs are located within the cell, as well as in thecirculation and extracellular environment, and can be detected inbiological fluids.

An analysis of miRNAs highly conserved in vertebrates shows that eachhas roughly 400 conserved messenger RNA (mRNA) targets. Accordingly, aparticular miRNA can reduce the stability of hundreds of unique mRNAs,and may repress the production of hundreds of proteins. This repressionis often relatively mild, for example, usually less than 2-fold. Humandisease can be associated with deregulation or dysregulation of miRNAsas demonstrated for chronic lymphocytic leukemia and other B cellmalignancies. A manually curated, publicly available database,miR2Disease, documents known relationships between miRNA levels (up- ordown-regulated miRNAs) and human disease.

However, despite the clear role that miRNAs and other small non-codingRNAs have in the biology of cells and their association with humandisease, their diagnostic potential has not been realized. It is anobjective of the present invention to unlock the diagnostic potential ofmiRNAs and other small, non-coding RNAs (sRNAs).

SUMMARY OF THE INVENTION

In various aspects and embodiments, the invention provides a method foridentifying or detecting small RNA (sRNA) predictors of a disease or acondition. The method comprises identifying one or more sRNA sequencesthat are present in one or more samples of an experimental samplecohort, and which are not present in samples of a comparator cohort(“positive sRNA predictor”). In some embodiments, the method furthercomprises identifying one or more sRNA sequences that are present in oneor more samples of a comparator sample cohort, and which are not presentin samples of an experimental cohort (“negative sRNA predictor”). Incontrast to identifying dysregulated small RNAs (such as microRNAs(miRNAs or miRs) that are up- or down-regulated), the inventionidentifies sRNAs that are binary predictors, that is, present in onecohort (e.g., an experimental cohort) and not another (e.g., acomparator cohort). Further, by quantifying reads for individualsequences (e.g., iso-miRs), without consolidating reads to annotatedreference sequences, the invention unlocks the diagnostic utility ofmiRs and other sRNAs. In some embodiments, the one or more sRNApredictors, or a set of sRNA predictors, is validated in an independentcohort of experimental and comparator samples, different from theexperimental and comparator samples from whence they were discovered, toevaluate the ability of the sRNA predictors to discriminate experimentaland comparator samples.

In various embodiments, sRNA predictors are identified from sRNAsequencing data. Specifically, sRNA sequencing data is generated orprovided for samples across an experimental cohort and a comparatorcohort, for example, using any next-generation sequencing platform. sRNApredictors can be identified in sequence data from any type ofbiological sample, including solid tissues, biological fluids (e.g.,cerebrospinal fluid and blood), or in some embodiments, cultured cells.The invention is applicable to various types of eukaryotic andprokaryotic cells and organisms, including animals, plants, andmicrobes.

Generally, sRNA predictors can be identified for various utilities inunderstanding the state of cells or organisms, including utilities inhuman and animal health, as well as agriculture. For example, theinvention finds use in diagnostics, prognostics, drug discovery,toxicology, and therapeutics including personalized medicine. In someembodiments, the invention provides for diagnosis or stratification of ahuman or animal disease. For example, conditions that can define theexperimental cohort include neurodegenerative diseases, cardiovasculardiseases, inflammatory and/or immunological diseases, and cancers.Further, sRNA predictors can be identified for detecting a diseasestate, including early or asymptomatic stage disease (e.g., beforenoticeable or substantial symptoms appear) or distinguishing amongdiseases or conditions that manifest with similar symptoms. Exemplaryconditions include diagnosis (including early diagnosis) orstratification of neurodegenerative conditions such as Alzheimer'sDisease, Parkinson's Disease, Huntington's Disease, Amyotrophic LateralSclerosis, and Multiple Sclerosis.

The sRNA predictor(s) may be identified by a software program thatquantifies the number of reads for each unique sRNA sequence in eachsample in the experimental and comparator cohorts. In variousembodiments, the software program trims the adaptor sequences from theindividual sequences, so as to identify individual sRNAs, including miRsand iso-miRs. In this manner, iso-miRs with templated and non-templatedvariations at the 3′- and 5′-end are identified, among other sRNAs.After trimming, the sequence reads from the experimental cohort and thecomparator cohort can each be compiled into a dictionary, and compared,to identify sequences that are present in one cohort, but not the other.Unique sequences and the amount (i.e. read count) of the unique readsfor each sample or group of samples in the experimental cohort areannotated. sRNA sequences are not aligned to a reference sequence, andthus, each sequence can be individually quantified across samples.

In some embodiments, sRNA predictors are selected that have a read countof at least 5 or at least about 50 in the samples from the experimentalcohort that are positive for the sRNA predictor. In still otherembodiments, the sRNA predictors are present in at least about 7% of theexperimental cohort samples, or are present in at least about 10% ofcomparator samples. In some embodiments, several sRNA predictors (suchas four or more) are identified in the experimental cohort and/or thecomparator cohort, and which may be selected for inclusion in an sRNApredictor panel. For example, binary predictors identified in theexperimental cohort are positive predictors, while binary predictorsidentified in the comparator cohort are negative predictors.

In some embodiments, a panel of sRNA predictors is selected forvalidation or detection of the condition in independent samples. Forexample, a panel of from 1 to about 200, or from 1 to about 100, or from1 to about 50 sRNA, or from 1 to about 10 predictors can be selected,where the presence of one or more positive predictors (optionally withthe absence of one or more negative predictors) is predictive of thecondition that defines the experimental cohort. In some embodiments, thepresence of 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 positive predictors from thepanel, optionally with an absence of the entire panel of negativepredictors, is predictive of the condition. While not each experimentalsample will be positive for each positive predictor, the panel is largeenough to provide nearly complete coverage for the condition in theexperimental cohort or in independent samples (e.g., the population).For example, the presence of from 1 to about 100, or from 1 to about 50,or from 1 to about 20, or from 1 to about 10 sRNA positive predictors ina sample can be predictive of the condition that defines theexperimental cohort. Validation samples can be evaluated by sRNAsequencing, or alternatively by RT-PCR (including Real Time PCR or anyquantitative or qualitative PCR format) or other sRNA detection assay.

In various embodiments, detection of the sRNA predictors is migrated toone of various detection platforms, which can employreverse-transcription and amplification, and/or hybridization of adetectable probe (e.g., fluorescent probe). An exemplary format isTAQMAN RealTime PCR Assay. Alternatively, sRNA predictors in the panel,or their amplicons, are detected by a hybridization assay.

In other aspects, the invention provides a kit comprising a panel offrom 1 to about 200 or from 1 to about 100, or from 1 to 50 sRNApredictor assays, which may include one or both of positive and negativepredictors. Such assays may comprise amplification primers and/or probesspecific for the detection of the sRNA predictors over annotatedsequences, as well as over other (non-predictive) 5′- and/or3′-templated and/or non-templated variations. In some embodiments, thekit is in the form of an array, and may contain probes specific for thedetection of sRNA predictors by hybridization. The majority, or all, ofthe sRNA predictors are sRNAs in which any miRNA predictors contain avariation from a reference miRNA sequence.

In other aspects, the invention provides a method for determining acondition of a subject. The method comprises obtaining a biologicalfluid sample, and identifying the presence or absence of one or moresRNA predictors identified in RNA sequence data according to the methodsdescribed herein, where the presence of one or more positive sRNApredictors in the sample, and optionally the absence of one or morenegative predictors, is predictive or diagnostic for the condition. Insome embodiments, the sRNA predictor(s) are identified in a sample froma human patient by a detection technology that involves amplificationand/or probe hybridization, such as Real Time PCR (e.g., TAQMAN) assay.The biological fluid sample from the patient can be blood, serum,plasma, urine, saliva, or cerebrospinal fluid.

In various embodiments, the patient is suspected of having aneurodegenerative disease, a cardiovascular disease, an inflammatoryand/or immunological disease, or a cancer. For example, the patient maybe displaying one or more symptoms of the condition. In someembodiments, the patient is suspected of having a neurodegenerativedisease selected from Amyotrophic Lateral Sclerosis (ALS), Parkinson'sDisease, Alzheimer's Disease, Huntington's Disease, or MultipleSclerosis.

The sample is tested across a panel of sRNA detection assays, such asfrom 1 to about 100, or from about 4 to 100 sRNA detection assays, andin some embodiments the majority of the sRNAs detected in the patientsample (or all of the sRNAs detected in the patient sample) are notannotated reference miRNAs. The panel may however include one or moremiRNAs for detection as a control.

In other aspects of the invention, positive and/or negative predictorscan be employed to classify a mixed population of cells in vivo or exvivo, through targeted expression of a gene with a detectable orbiological impact. For example, a desired protein can be expressed froma gene construct (such as a plasmid) or expressed from mRNA delivered tocells in vivo or ex vivo. In these embodiments, the gene is deliveredunder the regulatory control of target site(s) specific for the one ormore small RNA predictors. The target site(s) (target sites for specifichybridization with the predictors) can be placed in non-coding segments,such as the 3′ and/or 5′ UTRs, such that the encoded protein is onlyexpressed in biologically significant amounts when the desiredpredictor(s) are absent in the cell. The protein encoded by theconstruct may be a reporter protein, a transcriptional activator, atranscriptional repressor, a pro-apoptotic protein, a pro-survivalprotein, a lytic protein, an enzyme, a cytokine, a toxin, or acell-surface receptor. In these aspects, the predictors can be used totarget expression of a desired protein for therapeutic impact, either totarget diseased cells for killing, or to protect non-diseased cells fromtoxic insult.

Other aspects and embodiments of the invention will be apparent from thefollowing examples.

DESCRIPTION OF THE FIGURES

FIGS. 1A and 1B illustrates the standard method for analyzing small RNAsequencing data, from embodiments of the present invention. The objectof standard processes is to identify dysregulated sRNAs (up- ordown-regulated) for validation in larger cohorts using targeted assayssuch as quantitative PCR (e.g., TAQMAN). For sequence analysis, adaptersequences are trimmed, reads are aligned to a reference, and readnumbers are quantified for each reference sRNA. Diagnostic sRNAs areselected based on the level of differential expression between samplesand/or groups of samples. FIG. 1A is an illustrative example showingmapped small RNA sequence reads (in this case a miRNA, miR-X) aligned toa reference. As shown, miR-X is present in both a Disease and Controlsample, and is not a homogenous sequence, but rather a heterogeneousseries of iso-miRs that all map to the same region. Lines representingsequence reads are shaded to depict various iso-miR sequences. The lightgrey box highlights the annotated miR-X reference sequence. FIG. 1B isan illustrative example of how the mapped sequencing data for miR-X fromFIG. 1A is condensed and quantified, which is the sum of all of theiso-miRs for miR-X. In this particular example, miR-X would beconsidered to have diagnostic value/potential, since there is a 2-folddifference in expression when comparing the Disease and Control sample.

FIG. 2 illustrates sequencing data for the human miRNA, miR-10b derivedfrom a frontal cortex (region BA9) tissue sample taken from a patientwith Huntington's Disease (SRR1759249) or non-diseased, Healthy Control(SRR1759213). The reference is shown with the annotated miR-10b sequencehighlighted. The number of reads for each sequence is shown. In thisparticular example, there are 8 miR-10b iso-miRs in addition to theannotated miR-10b sequence found in these samples. The total read countfor the Huntington's Disease and Healthy Control samples are 1670 and336, respectively. Thus, there is 5-fold greater amount of ‘total’miR-10b in the Huntington's Disease sample when compared to the HealthyControl.

FIGS. 3A and 3B illustrate how miRNA sequencing data is sorted andquantified across samples according to embodiments of the presentinvention. FIG. 3A illustrates the approach according to the presentdisclosure, where iso-miRs (or other sRNAs) are sorted by theirindividual iso-miR sequences, and therefore do not require alignment toa reference. Lines representing sequence reads are shaded to depictidentical iso-miR sequences. FIG. 3B shows how sequence reads foriso-miRs (or other sRNAs) are quantified based on their unique sequence,not by alignment to a reference.

FIG. 4 illustrates the analytic method described herein for identifyingpositive and negative predictors in small RNA sequencing data. Asdepicted for miR-X, there are 2 binary, positive predictors for in theDisease sample and 1 binary, negative predictor in the Control sample.These positive and negative predictors can be used in a diagnostic panelto test for the condition in which they have been identified.Furthermore, FIG. 4 illustrates that the miR-X annotated sequence ispresent in equal amounts in both the Disease and Control sample, and istherefore non-diagnostic. Additionally, FIG. 4 illustrates that a miR-Xiso-miR is present in both the Disease and Control sample with a2.5-fold difference, however since this iso-miR is not binary, it is notincluded in a diagnostic panel.

FIG. 5 illustrates that quantitative PCR assays (e.g., based on TAQMANformat) can be designed that give >99.9% specificity for iso-miRs orother sRNAs of interest. Here, hairpin-RT TAQMAN qPCR assays weredesigned for the indicated annotated miR, iso-miR 1 (that has anadditional 3′-terminal uridine) or iso-miR 2 (that has an additional3′-terminal guanidine). Synthetic RNA, as indicated was reversetranscribed using a targeted hairpin-RT primer. cDNA was amplified byqPCR in the presence of a TAQMAN probe specific to each RNA sequence.Shown is the percent relative detection, for a TAQMAN assay to detecteach synthetic RNA.

FIG. 6 is a heat map in which the top 335 highest frequency small RNAsfound in Huntington's Disease (top), healthy controls (bottom), and bothHuntington's Disease and healthy controls (middle) were clustered usingWard's agglomerative clustering with incomplete linkage.

FIG. 7 shows experimental validation of eight positive small RNApredictors identified in Huntington's Disease samples, using Reversetranscription (RT) hairpin-based TAQMAN quantitative polymerase chainreaction (qPCR) assays (ThermoFisher Scientific). Clinical information(disease vs non-disease, and disease grade) was unmasked and the sampleswere decoded and Ct values were plotted for healthy controls andHuntington's Disease.

FIG. 8 shows an analysis of eight biomarkers for a correlation of Ct todisease grade using Box-Whisker plots. Ct values of three biomarkersnamed Huntington's Disease Biomarker-4 (HDB-4), HDB-5, HDB-7 correlatedwith disease grade by Analysis of Variance (ANOVA).

FIG. 9 is a heat map in which the top 335 highest frequency small RNAsfound in Parkinson's Disease (top), healthy controls (bottom), and bothParkinson's Disease and healthy controls (middle) were clustered usingWard's agglomerative clustering with incomplete linkage. Analysis oftissue from frontal cortex (region BA9), CSF (cerebrospinal fluid), andSerum is shown.

FIG. 10 illustrates tissue-specific biomarker overlap for Parkinson'sdisease predictors. (TIS indicates tissue, CSF indicates cerebrospinalfluid, SER indicates serum).

FIG. 11 is a heat map in which the top 335 highest frequency small RNAsfound in Alzheimer's Disease (top), healthy controls (bottom), and bothAlzheimer's Disease and healthy controls (middle) were clustered usingWard's agglomerative clustering with incomplete linkage. Analysis ofCSF, Serum, and Whole Blood (WB) is shown.

FIG. 12 illustrates tissue-specific biomarker overlap for Alzheimer'sDisease (TIS indicates tissue, CSF indicates cerebrospinal fluid, SERindicates serum, WB indicates whole blood).

FIG. 13 is a heat map in which the top 335 highest frequency small RNAsfound in breast cancer tissue (top), healthy controls (bottom), and bothbreast cancer and healthy controls (middle) were clustered using Ward'sagglomerative clustering with incomplete linkage.

DETAILED DESCRIPTION OF THE INVENTION

In various aspects and embodiments, the invention provides a method foridentifying or detecting binary small RNA (sRNA) predictors of a diseaseor a condition. The method comprises identifying one or more sRNAsequences that are present in one or more samples of an experimentalcohort, and which are not present in any of the samples in a comparatorcohort (“positive sRNA predictors”). In some embodiments, the methodfurther comprises identifying one or more sRNA sequences that arepresent in one or more samples of the comparator cohort, and which arenot present in any of the samples of the experimental cohort (“negativesRNA predictors”). In contrast to identifying dysregulated sRNAs (suchas miRNAs that are up- or down-regulated), the invention identifiessRNAs that are binary predictors, that is, sRNAs that are only presentin one cohort (e.g., an experimental cohort) and not another (e.g., acomparator cohort). Further, by quantifying reads for individualsequences (e.g., iso-miRs), without consolidating reads to annotatedreference sequences, the invention unlocks the diagnostic utility ofmiRs and other sRNAs.

In some embodiments, the presence of the one or more sRNA predictors(positive and/or negative predictors) is tested in an independent cohortof experimental and comparator samples, to evaluate the ability of thesRNA predictors to discriminate samples, thereby validating thediagnostic, prognostic, or other utility of the sRNA predictors.Diagnostic kits that detect one or a panel of sRNA predictors (positiveand/or negative predictors) in a sample can be prepared in any desireddetection format, including quantitative or qualitative PCR orhybridization-based assays, as described more fully herein.

In various embodiments, sRNA sequencing data is generated or providedfrom a sample or group of samples across an experimental cohort andcomparator cohort, and sRNA predictors are identified in the RNAsequencing data according to the following disclosure.

sRNA sequencing enriches and sequences small RNA species, such asmicroRNA (miRNA), Piwi-interacting RNA (piRNA), small interfering RNA(siRNA), vault RNA (vtRNA), small nucleolar RNA (snoRNA), transferRNA-derived small RNAs (tsRNA), ribosomal RNA-derived small RNAfragments (rsRNA), small rRNA-derived RNA (srRNA), and small nuclear RNA(U-RNA). For example, in providing the sRNA sequencing data, inputmaterial may be enriched for small RNAs. Sequence library constructionis performed with sRNA-enriched material using any of several processesor commercially-available kits depending on the high-throughputsequencing platform being employed. Generally, sRNA sequencing librarypreparation comprises isolating total RNA from samples, sizefractionation, ligation of sequencing adaptors, reverse transcriptionand PCR amplification, and DNA sequencing.

More particularly, in a given sample all the RNA (i.e. total RNA) isextracted and isolated. The small RNAs are isolated by sizefractionation, for example, by running the isolated RNA on a denaturingpolyacrylamide gel (or using any of a variety of commercially availablekits). A ligation step then adds adaptors to both ends of the smallRNAs, which act as primer binding sites during reverse transcription andPCR amplification. For example, a preadenylated single strand DNA3′-adaptor followed by a 5′-adaptor are ligated to the small RNAs usinga ligating enzyme such as T4 RNA Ligase 2 Truncated (T4 Rn12tr K227Q).The adaptors are designed to capture small RNAs with a 5′-phosphate and3′-hydroxyl group, characteristic of biologically processed small RNAs(e.g., microRNAs), rather than RNA degradation products with a 5′hydroxyl and 3′ phosphate group. The sRNA library is then reversetranscribed and amplified by PCR. This step converts the small adaptorligated RNAs into cDNA clones that are the template for the sequencingreaction. Primers designed with unique nucleotide tags can also be usedin this step to create ID tags (i.e., bar codes) in pooled librarymultiplex sequencing.

Any DNA sequencing platform can be employed, including anynext-generation sequencing platform such as pyrosequencing (e.g., 454Life Sciences), polymerase-based sequence-by-synthesis (e.g., Illumina),or sequencing-by-ligation (e.g., ABI Solid Sequencing platform), amongothers.

In various embodiments, sequencing data can be generated and/or providedfrom historical studies, and evaluated for sRNA predictors according tothe following disclosure.

The sequencing data can be in any format, such as FASTA or FASTQ format.FASTA format is a text-based format for representing nucleotidesequences, where nucleotides are represented using single-letter codes.The format also allows for sequence names and comments to precede thesequences. FASTQ format includes corresponding quality scores. Both thesequence letter and quality score are each encoded with a single ASCIIcharacter for brevity.

sRNA predictors can be identified in any biological samples, includingsolid tissues and/or biological fluids. sRNA predictors can beidentified in prokaryotic or eukaryotic organisms, including animals(e.g., vertebrates and invertebrates), plants, microbes (e.g., bacteriaand yeast), or in some embodiments, cultured cells derived from thesesources. For example, in some embodiments the experimental andcomparator samples are biological fluid samples from human or animalsubjects (e.g., a mammalian subject), such as blood, serum, plasma,urine, saliva, or cerebrospinal fluid. miRNAs can be found in biologicalfluid, as a result of a secretory mechanism that may play an importantrole in cell-to-cell signaling. See, Kosaka N, et al., CirculatingmicroRNA in body fluid: a new potential biomarker for cancer diagnosisand prognosis, Cancer Sci. 2010; 101: 2087-2092). miRs fromcerebrospinal fluid and serum have been profiled according toconventional methods with the goal of stratifying patients for diseasestatus and pathology features. Burgos K, et al., Profiles ofExtracellular miRNA in Cerebrospinal Fluid and Serum from Patients withAlzheimer's and Parkinson's Diseases Correlate with Disease Status andFeatures of Pathology, PLOS ONE Vol. 9, Issue 5 (2014). Thus, samples inthe experimental cohort and the comparator cohort can be biologicalfluid samples, such as blood, serum, plasma, urine, saliva, orcerebrospinal fluid. In some embodiments, sRNA predictors are identifiedin at least two different types of fluid samples. For example, withregard to detection of neurodegenerative disease, sRNA predictors can beidentified in both blood (or serum) and cerebrospinal fluid.

An experimental cohort is a collection of samples that have a definedcondition. The experimental cohort can be a collection of samples fromhuman or animal subjects or patients. Conditions include, in someembodiments, neurodegenerative diseases, cardiovascular diseases,inflammatory and/or immunological diseases, and cancers, includingparticular conditions described more fully below. Experimental cohortscan be further defined based on late-stage or early-stage disease, orcourse of disease progression, treatment received, and patient responseto treatment. An experimental cohort generally comprises a plurality ofsamples, but in various embodiments, includes at least 1 sample, or atleast about 5 samples, or at least about 10 samples, or at least about15 samples, or at least about 20 samples, or at least about 25 samples,or at least about 50 samples, or at least about 75 samples, or at leastabout 100 samples, or at least about 150 samples, or at least about 200samples, or at least about 250 samples. Larger experimental cohorts(e.g., at least 100 samples) are preferred in some embodiments.

A comparator cohort is a collection of samples that do not have thecondition that defines the experimental cohort. For example, thecomparator cohort can include samples from subjects or patientsidentified as healthy comparators, or otherwise having a differentcondition or disease, including conditions or diseases with similar, butdifferent symptoms to the disease or condition of interest (e.g.,similar symptoms to the disease or condition that defines theexperimental cohort samples). A comparator cohort generally comprises aplurality of samples, but in various embodiments, includes at least 1sample, or at least about 5 samples, or at least about 10 samples, or atleast about 15 samples, or at least about 20 samples, or at least about25 samples, or at least about 50 samples, or at least about 75 samples,or at least about 100 samples, or at least about 150 samples, or atleast about 200 samples, or at least about 250 samples. Largercomparator cohorts are preferred in some embodiments (e.g., at least 100samples), however the comparator cohort may be similar in size to orsmaller than the experimental cohort. In some embodiments, thecomparator cohort is similar to the experimental cohort in patientmake-up, in terms of, for example, age, gender, and/or ethnicity.

sRNA predictors can be identified for various utilities in understandingthe state of cells or organisms, including utilities in human and animalhealth, as well as agriculture. For example, the invention finds use indiagnostics, prognostics, drug discovery, toxicology, and therapeuticsincluding personalized medicine. In some embodiments, the inventionprovides for diagnosis or stratification of a human or animal disease.For example, sRNA predictors can be identified for detecting a diseasestate, including early stage or asymptomatic disease (e.g., beforenoticeable or substantial symptoms) or distinguishing diseases orconditions that manifest with similar symptoms. In other embodiments,sRNA predictors are identified that distinguish disease courses, such asslowly and quickly progressing disease states, or disease subtypes(e.g., relapsing remitting MS, secondary progressive MS, primaryprogressive MS, or progressive relapsing MS), or stratify for diseaseseverity. In these embodiments, experimental and comparator cohorts aredesigned to distinguish two or more disease states, based uponclassification of each patient's disease across the two or more states.In still other embodiments, sRNA predictors identify patients forresponse to one or more available therapeutic regimens. In theseembodiments, experimental and comparator cohorts are designed todistinguish responses to treatment (e.g., by classifying patient samplesbased upon treatment received by each patient and/or the responseachieved). In some embodiments, sRNA predictors are identified thatdistinguish a toxic response to an environmental or pharmaceuticalagent.

In some embodiments, the presence and/or absence of sRNA predictors areapplied as surrogate endpoints to establish safety and/or efficacy of acandidate agent, or for treatment monitoring, by evaluating the presenceand/or absence of the sRNA predictors in patient samples during clinicaltrials or during treatment. For example, positive predictors may befound before treatment with a candidate agent, and may decrease or beeliminated with successful drug treatment. Alternatively, or inaddition, negative predictors may be absent before treatment, but mayemerge during successful treatment.

With respect to human or animal diagnostics, various types of diseasesand conditions can be evaluated in accordance with various embodiments,including neurodegenerative disease, cardiovascular disease,inflammatory and/or immunological disease, and cancer.

Neurodegenerative disease is an umbrella term for the progressive lossof structure or function of neurons, including death of neurons.Exemplary neurodegenerative diseases include Alzheimer's Disease,Amyotrophic Lateral Sclerosis (ALS), Huntington's Disease, MultipleSclerosis, Parkinson's Disease, and various types of dementia (e.g.,Frontotemporal Dementia, Lewy Body Dementia, or Vascular Dementia).Neurodegenerative conditions generally result in progressivedegeneration and/or death of neuronal cells. In some embodiments, theneurodegenerative disease results in dementia in at least a substantialportion of patients. In some embodiments, the neurodegenerative diseaseresults in a motion disorder in at least a substantial portion ofpatients. While conditions can be late on-set, in some embodiments, thedisease can manifest as early on-set (e.g., before about 50 years ofage).

In some embodiments, sRNA predictors are identified in a cohort ofAlzheimer's Disease (AD) samples. AD is characterized by loss of neuronsand synapses in the cerebral cortex and certain subcortical regions.This loss results in gross atrophy of the affected regions, includingdegeneration in the temporal lobe and parietal lobe, and parts of thefrontal cortex and cingulate gyms. Alzheimer's Disease has beenhypothesized to be a protein misfolding disease, caused by accumulationof abnormally folded Amyloid-beta and Tau proteins in the brain. In someembodiments, the experimental cohort samples are biological fluidsamples from patients diagnosed as having AD. Comparator cohort samplescan be patients identified as not having AD, and may optionally includepatients with other (non-AD) neurodegenerative or inflammatory disease.

In some embodiments, sRNA predictors are identified in a cohort ofParkinson's Disease (PD) samples. PD manifests as bradykinesia,rigidity, resting tremor and posture instability. PD is a degenerativedisorder of the central nervous system that involves the death ofdopamine-generating cells in the substantia nigra, a region of themidbrain. The mechanism by which the brain cells in PD are lost mayinvolve an abnormal accumulation of the protein alpha-synuclein bound toubiquitin in the damaged cells. The alpha-synuclein-ubiquitin complexcannot be directed to the proteosome. This protein accumulation formsproteinaceous cytoplasmic inclusions called Lewy bodies. In someembodiments, the experimental cohort samples are biological fluidsamples from patients diagnosed as having PD. Comparator cohort samplescan be patients identified as not having PD, and may optionally includepatients with other (non-PD) neurodegenerative or inflammatory disease.

In some embodiments, sRNA predictors are identified in a cohort ofHuntington's Disease (HD) samples. HD causes astrogliosis and loss ofmedium spiny neurons. Areas of the brain are affected according to theirstructure and the types of neurons they contain, reducing in size asthey cumulatively lose cells. The areas affected are mainly in thestriatum, but also the frontal and temporal cortices. Mutant Huntingtonis an aggregate-prone protein. In some embodiments, the experimentalcohort samples are biological fluid samples from patients diagnosed ashaving HD. Comparator cohort samples can be patients identified as nothaving HD, and may optionally include patients with other (non-HD)neurodegenerative or inflammatory disease.

In some embodiments, sRNA predictors are identified in a cohort ofAmyotrophic Lateral Sclerosis (ALS) samples. ALS is a disease in whichmotor neurons are selectively targeted for degeneration. Some patientswith familial ALS have a missense mutation in the gene encoding theantioxidant enzyme Cu/Zn superoxide dismutase 1 (SOD1). TDP-43 and FUSprotein aggregates have been implicated in some cases of the disease,and a mutation in chromosome 9 (C9orf72) is thought to be the mostcommon known cause of sporadic ALS. In some embodiments, theexperimental cohort samples are biological fluid samples from patientsdiagnosed as having ALS. Comparator cohort samples can be patientsidentified as not having ALS, and may optionally include patients withother (non-ALS) neurodegenerative disease.

In some embodiments, sRNA predictors are identified in a cohort ofsamples from migraine subjects, such as biological fluid samples frommigraine subjects. In some embodiments, the migraine is episodicmigraine, chronic migraine, or cluster headache. sRNA predictors inthese embodiments are useful for evaluating the subject's condition, oralternatively or in addition, selecting an appropriate treatment.Comparator cohort samples can be subjects identified as not havingmigraine, and may optionally include patients with other non-migraineconditions, or a different form of migraine from the experimentalcohort.

Cardiovascular disease (CVD) is a class of diseases that involve theheart or blood vessels. Cardiovascular disease includes coronary arterydiseases (CAD) such as angina and myocardial infarction. Other CVDs arestroke, heart failure, hypertensive heart disease, rheumatic heartdisease, cardiomyopathy, heart arrhythmia, congenital heart disease,valvular heart disease, carditis, aortic aneurysms, peripheral arterydisease, and venous thrombosis.

The underlying mechanisms of coronary artery disease, stroke, andperipheral artery disease involve atherosclerosis, which may be causedby high blood pressure, smoking, diabetes, lack of exercise, obesity,high blood cholesterol, poor diet, and excessive alcohol consumption,among other things. It is estimated that 90% of CVD is preventable byimproving risk factors through: healthy eating, exercise, avoidance oftobacco smoke, limiting alcohol intake, and treating high bloodpressure, for example. In some embodiments, the experimental cohortcomprises samples from patients having coronary artery disease,peripheral artery disease, cerebrovascular disease, cardiomyopathy,hypertensive heart disease, heart failure (e.g., congestive heartfailure), pulmonary heart disease, cardiac dysrhythmia, inflammatoryheart disease, endocarditis, myocarditis, inflammatory cardiomegaly,valvular heart disease, congenital heart disease, or rheumatic heartdisease. The comparator cohort can comprise samples from patients thatdo not have the CVD, or a distinct CVD from the experimental cohort.

In some embodiments, sRNA predictors are identified to stratify patientsfor risk of an acute event related to CVD, such as myocardial infarctionor stroke. Existing cardiovascular disease or a previous cardiovascularevent, such as a heart attack or stroke, is the strongest predictor of afuture cardiovascular event. Age, sex, smoking, blood pressure, bloodlipids and diabetes are important predictors of future cardiovasculardisease in people who are not known to have cardiovascular disease.These measures, and sometimes others, may be combined into compositerisk scores to estimate an individual's future risk of cardiovasculardisease. Numerous risk scores exist although their respective merits aredebated. Other diagnostic tests and biomarkers remain under evaluationbut currently these lack clear-cut evidence to support their routine use(e.g., family history, coronary artery calcification score, highsensitivity C-reactive protein (hs-CRP), ankle brachial index,lipoprotein subclasses and particle concentration, lipoprotein(a),apolipoproteins A-I and B, fibrinogen, white blood cell count,homocysteine, N-terminal pro B-type natriuretic peptide (NT-proBNP), andmarkers of kidney function). In some embodiments, the experimentalcohort comprises patients at a high risk of myocardial infarction orstroke (e.g., top 25% or top 20% or top 10% of risk scores), and thecomparator cohort comprises patients with relatively low risk scores forthe same (e.g., bottom quartile or less).

In some embodiments, the sRNA predictor identifies or evaluates animmunological or inflammatory disease. For example, in some embodiments,the condition is an autoimmune or inflammatory disorder, such as Lupus(SLE), Scleroderma, Vasculitis, Diabetes mellitus (e.g., Type 1 or Type2), Graves' disease, Rheumatoid arthritis, Multiple Sclerosis,Fibromyalgia, Psoriasis, Crohn's Disease, Celiac Disease, COPD, or afibrotic condition such as pulmonary fibrosis (e.g., IPF). In someembodiments, the condition is an inflammatory condition, which maymanifest as type I hypersensitivity, type II hypersensitivity, type IIIhypersensitivity, and/or type IV hypersensitivity. The inflammatorycondition may be chronic. In some embodiments, the experimental cohortsamples are biological fluid samples from patients diagnosed as having aparticular inflammatory disease. Comparator cohort samples can bepatients identified as not having the particular inflammatory disease,and may optionally include patients with other inflammatory disease. Insome embodiments, the comparator cohort comprises patients with apositive or negative (or even toxic) response to a particular treatmentregimen.

In some embodiments, the sRNA predictor is predictive of the presence ofcancer, or the presence of an aggressive cancer, or is predictive ofremission or recurrence, metastasis, progression free interval, overallsurvival, or response to treatment (e.g., radiation therapy,chemotherapy, or treatment with a checkpoint inhibitor selected fromanti-CTLA4, PD-1, PD-L1, IDO, or CAR T-cell therapy). In someembodiments, the sRNA predictor is predictive of high toxicity upontreatment with a particular agent. In some embodiments, the sRNApredictors are predictive of a complete response of a particular cancerto a particular treatment. The cancer may be Carcinoma, Sarcoma,Lymphoma, Germ cell, or Blastoma. The cancer can occur in sitesincluding, but not limited to lung, skin, breast, ovary, intestine,pancreas, bone, and brain, among others. In some embodiments, the canceris stage I or stage II cancer. In other embodiments, the cancer is stageIII or stage IV.

Illustrative cancers include, but are not limited to, basal cellcarcinoma, biliary tract cancer; bladder cancer; bone cancer; brain andcentral nervous system cancer; breast cancer; cancer of the peritoneum;cervical cancer; choriocarcinoma; colon and rectum cancer; connectivetissue cancer; cancer of the digestive system; endometrial cancer;esophageal cancer; eye cancer; cancer of the head and neck; gastriccancer (including gastrointestinal cancer); glioblastoma; hepaticcarcinoma; hepatoma; intra-epithelial neoplasm; kidney or renal cancer;larynx cancer; leukemia; liver cancer; lung cancer (e.g., small-celllung cancer, non-small cell lung cancer, adenocarcinoma of the lung, andsquamous carcinoma of the lung); melanoma; myeloma; neuroblastoma; oralcavity cancer (lip, tongue, mouth, and pharynx); ovarian cancer;pancreatic cancer; prostate cancer; retinoblastoma; rhabdomyosarcoma;rectal cancer; cancer of the respiratory system; salivary glandcarcinoma; sarcoma; skin cancer; squamous cell cancer; stomach cancer;testicular cancer; thyroid cancer; uterine or endometrial cancer; cancerof the urinary system; vulval cancer; lymphoma including Hodgkin's andnon-Hodgkin's lymphoma, as well as B-cell lymphoma (including lowgrade/follicular non-Hodgkin's lymphoma (NHL); small lymphocytic (SL)NHL; intermediate grade/follicular NHL; intermediate grade diffuse NHL;high grade immunoblastic NHL; high grade lymphoblastic NHL; high gradesmall non-cleaved cell NHL; bulky disease NHL; mantle cell lymphoma;AIDS-related lymphoma; and Waldenstrom's Macroglobulinemia; chroniclymphocytic leukemia (CLL); acute lymphoblastic leukemia (ALL); Hairycell leukemia; chronic myeloblastic leukemia; as well as othercarcinomas and sarcomas; and post-transplant lymphoproliferativedisorder (PTLD), as well as abnormal vascular proliferation associatedwith phakomatoses, edema (e.g. that associated with brain tumors), andMeigs' syndrome. In some embodiments, the experimental cohort samplesare biological fluid samples from patient diagnosed as having aparticular defined cancer. Comparator cohort samples can be patientsidentified as not having the cancer, and may optionally include patientswith other non-cancerous disease or condition.

The sRNA predictor may be identified by a software program thatquantifies the number of reads for each unique sRNA sequence in eachsample in the experimental and comparator sample cohorts. In variousembodiments, the software program trims the adaptor sequences from theindividual sequences, so as to identify individual sRNAs, including miRsand iso-miRs and other sRNAs. In this manner, iso-miRs with templatedand non-templated variations at the 3′- and 5′-end are identified.

“iso-miR” refers to those sequences that have variations with respect tothe reference miRNA sequence (e.g., as used by miRBase). In miRBase,each miRNA is associated with a miRNA precursor and with one or twomature miRNA (-5p and -3p). Deep sequencing has detected a large amountof variability in miRNA biogenesis, meaning that from the same miRNAprecursor many different sequences can be generated. There are four mainvariations of iso-miRs: (1) 5′ trimming, where the 5′ cleavage site isupstream or downstream from the referenced miRNA sequence; (2) 3′trimming, where the 3′ cleavage site is upstream or downstream from thereference miRNA sequence; (3) 3′ nucleotide addition, where nucleotidesare added to the 3′ end of the reference miRNA; and (4) nucleotidesubstitution, where nucleotides are changed from the miRNA precursor.

The software program in some embodiments trims a user-defined 3′sequencing adaptor from the sRNA sequence reads. The adaptor is definedby the user, based on the sequencing platform. By removing the adaptorsequence, iso-miRs and other sRNAs can be identified and quantified insamples. For example, in some embodiments the software program searchesfor regular expressions corresponding to a user-defined 3′ adaptor anddeletes them from the sRNA sequence reads as follows:

a. adaptor sequence

b. adaptor sequence permitting 1 wild-card

c. adaptor sequence permitting 1 insertion

d. adaptor sequence permitting 1 deletion

e. adaptor sequence permitting 2 deletions

f. adaptor sequence permitting 1 deletion and 1 wild-card

g. adaptor sequence permitting 1 insertion and 1 wild-card

h. adaptor sequence permitting 2 wild-cards

i. adaptor sequence permitting 3 wild-cards

j. adaptor sequence permitting 4 wild cards.

A wild-card is defined as being any one of the 4 deoxyribonucleic acids:(A) adenine, (T) thymine, (G) guanine, or (C) cytosine. However, thefirst nucleotide at the 5′ end of the user-specified 3′ adaptor sequenceis not altered (e.g., not considered an insertion or deletion orotherwise subject to wild-card change), thus preserving sRNA sequencesat the junction where the 3′ terminal nucleotide of the sRNA is ligatedto the 5′ terminal nucleotide of the 3′ adapter. If the 5′ terminalnucleotide of the user-specified 3′ adaptor does not correspond withwhat the user has specified, the 3′ adapter sequence is not trimmed, butcan be independently verified, if needed.

In some embodiments, sRNA having a length of at least 15 nucleotides, orat least 20 nucleotides (after trimming), are considered for analysis.

After trimming, the sequence reads from the experimental cohort and thecomparator cohort can be each compiled into a dictionary, and compared,to identify sequences that are present in samples of the experimentalcohort, but not the comparator cohort (e.g. positive predictors), and/orto identify sequences that are present in the comparator cohort, but notthe experimental cohort (e.g. negative predictors). Sequence reads thatare in both cohorts are discarded, and sequence reads that are unique toeither the experimental cohort or comparator cohort are added to anoutput file, the unique reads being candidate sRNA predictors. Theoutput file annotates the unique sequences and the count of the uniquesequence reads for each sample or group of samples in the cohorts. Invarious embodiments, the sequence reads are not filtered by a qualityscore. Further, sRNA sequences are not aligned to a reference sequence,and thus, each sequence can be individually quantified across samples.

In some embodiments, sRNA predictors are selected that have a count of(or an average count of) at least 5, at least 10, at least 20, at least50, at least 75, at least 100, at least 200, at least 500, or at least1000 reads in samples that are positive for the predictor (e.g., in theexperimental cohort for positive predictors or the comparator cohort fornegative predictors). In some embodiments, one or more (or all) positivesRNA predictors are present in at least about 5%, or at least about 10%of the experimental cohort samples, or at least about 15% ofexperimental cohort samples, or at least about 20% of experimentalcohort samples, or at least about 30% of experimental cohort samples, orat least about 40% or at least about 50% of experimental cohort samples.In some embodiments, at least 1, or at least about 5, or at least about10, or at least about 20, or at least about 30, or at least about 40, orat least about 50, or at least about 100 positive sRNA predictors areidentified in the experimental cohort, and a plurality of which (e.g.,from 1 to 100 or from 1 to 50, or from 1 to 10) may be selected forinclusion in an sRNA predictor panel. In some embodiments, from 4 to100, or from 10 to 100, or from 20 to 100 positive sRNA predictors areselected for inclusion in a panel.

In some embodiments, the negative sRNA predictors are present in atleast about 5% of the comparator cohort samples, or at least 10% of thecomparator cohort samples, or at least about 15% of the comparatorcohort samples, or at least about 20% of comparator cohort samples, orat least about 30% of comparator cohort samples, or at least about 40%or at least about 50% of comparator cohort samples. In some embodiments,at least 1, or at least about 5, or at least about 10, or at least about20, or at least about 30, or at least about 40, or at least about 50, orat least about 100 negative sRNA predictors are identified in thecomparator cohort, and a plurality of which (e.g., from 1 to 100, orfrom 1 to 50, or from 1 to 10) may be selected for inclusion in an sRNApredictor panel. In some embodiments, from 4 to 100, or from 10 to 100,or from 20 to 100 negative sRNA predictors are selected for inclusion ina panel.

A panel of sRNA predictors is selected for validation or detection ofthe condition in independent samples. For example, a panel of from 2 toabout 100 sRNA predictors can be selected, where the presence of any onepositive predictor, and the absence of all of the negative predictors ispredictive of the condition that defines the experimental cohort. Insome embodiments, the presence of any 2, 3, 4, 5, 6, 7, 8, 9 or 10positive sRNA predictors is predictive of the condition, optionally withthe absence of the negative predictors. In some embodiments, a panel offrom 2 to about 40 sRNA predictors are selected, or from 2 to about 30,or from 2 to about 20, or from 2 to about 10 sRNA predictors areselected for inclusion in a panel. In some embodiments, from 4 to about100, or from 4 to about 50, or from 4 to about 20, or from 4 to about15, or from 4 to about 10 sRNA predictors are selected for inclusion inthe panel. In these embodiments, the panel may optionally comprise atleast 5, or at least 10, or at least 20 sRNA predictors. While not eachexperimental sample will be positive for each positive predictor, thepanel is large enough to provide at least about 75%, at least about 80%,at least about 85%, at least about 90%, at least about 95%, or about100% coverage for the condition in the experimental cohort or inindependent samples. That is, the presence of from 1 to 10 positive sRNApredictors (e.g., any one or two) in a sample may be predictive of thecondition that defines the experimental cohort. The sample may furtherbe negative for the panel of negative predictors (e.g., from 1 to 10 orfrom 1 to 5 negative predictors). Validation samples can be evaluated bysRNA sequencing, or alternatively by RT-PCR or other assay.

In various embodiments, detection of the sRNA predictors is migrated toone of various detection platforms (e.g., other than RNA sequencing),which can employ reverse-transcription, amplification, and/orhybridization of a probe, including quantitative or qualitative PCR, orRealTime PCR. PCR detection formats can employ stem-loop primers forRT-PCR in some embodiments, and optionally in connection withfluorescently-labeled probes.

Generally, a real-time polymerase chain reaction (qPCR) monitors theamplification of a targeted DNA molecule during the PCR, i.e. inreal-time. Real-time PCR can be used quantitatively, andsemi-quantitatively. Two common methods for the detection of PCRproducts in real-time PCR are: (1) non-specific fluorescent dyes thatintercalate with any double-stranded DNA (e.g., SYBR Green (I or II)),and (2) sequence-specific DNA probes consisting of oligonucleotides thatare labelled with a fluorescent reporter which permits detection onlyafter hybridization of the probe with its complementary sequence (e.g.TAQMAN).

In some embodiments, the assay format is TAQMAN real-time PCR. TAQMANprobes are hydrolysis probes that are designed to increase thespecificity of quantitative PCR. The TAQMAN probe principle relies onthe 5′ to 3′ exonuclease activity of Taq polymerase to cleave adual-labeled probe during hybridization to the complementary targetsequence, with fluorophore-based detection. TAQMAN probes are duallabeled with a fluorophore and a quencher, and when the fluorophore iscleaved from the oligonucleotide probe by the Taq exonuclease activity,the fluorophore signal is detected (e.g., the signal is no longerquenched by the proximity of the labels). As in other quantitative PCRmethods, the resulting fluorescence signal permits quantitativemeasurements of the accumulation of the product during the exponentialstages of the PCR. The TAQMAN probe format provides high sensitivity andspecificity of the detection.

In some embodiments, sRNA predictors present in the sample are convertedto cDNA using specific primers, e.g., a stem-loop primer. Amplificationof the cDNA may then be quantified in real time, for example, bydetecting the signal from a fluorescent reporting molecule, where thesignal intensity correlates with the level of DNA at each amplificationcycle.

Alternatively, sRNA predictors in the panel, or their amplicons, aredetected by hybridization. Exemplary platforms include surface plasmonresonance (SPR) and microarray technology. Detection platforms can usemicrofluidics in some embodiments, for convenient sample processing andsRNA detection.

Generally, any method for determining the presence of sRNAs in samplescan be employed. Such methods further include nucleic acid sequencebased amplification (NASBA), flap endonuclease-based assays, as well asdirect RNA capture with branched DNA (QuantiGene™), Hybrid Capture™(Digene), or nCounter™ miRNA detection (nanostring). The assay format,in addition to determining the presence of miRNAs and other sRNAs mayalso provide for the control of, inter alia, intrinsic signal intensityvariation. Such controls may include, for example, controls forbackground signal intensity and/or sample processing, and/orhybridization efficiency, as well as other desirable controls fordetecting sRNAs in patient samples (e.g., collectively referred to as“normalization controls”).

In some embodiments, the assay format is a flap endonuclease-basedformat, such as the Invader™ assay (Third Wave Technologies). In thecase of using the invader method, an invader probe containing a sequencespecific to the region 3′ to a target site, and a primary probecontaining a sequence specific to the region 5′ to the target site of atemplate and an unrelated flap sequence, are prepared. Cleavase is thenallowed to act in the presence of these probes, the target molecule, aswell as a FRET probe containing a sequence complementary to the flapsequence and an auto-complementary sequence that is labeled with both afluorescent dye and a quencher. When the primary probe hybridizes withthe template, the 3′ end of the invader probe penetrates the targetsite, and this structure is cleaved by the Cleavase resulting indissociation of the flap. The flap binds to the FRET probe and thefluorescent dye portion is cleaved by the Cleavase resulting in emissionof fluorescence.

In some embodiments, RNA is extracted from the sample prior to sRNAprocessing for detection. RNA may be purified using a variety ofstandard procedures as described, for example, in RNA Methodologies, Alaboratory guide for isolation and characterization, 2nd edition, 1998,Robert E. Farrell, Jr., Ed., Academic Press. In addition, there arevarious processes as well as products commercially available forisolation of small molecular weight RNAs, including mirVANA™ Paris miRNAIsolation Kit (Ambion), miRNeasy™ kits (Qiagen), MagMAX™ kits (LifeTechnologies), and Pure Link™ kits (Life Technologies). For example,small molecular weight RNAs may be isolated by organic extractionfollowed by purification on a glass fiber filter. Alternative methodsfor isolating miRNAs include hybridization to magnetic beads.Alternatively, miRNA processing for detection (e.g., cDNA synthesis) maybe conducted in the biofluid sample, that is, without an RNA extractionstep.

Generally, assays can be constructed such that each assay is at least80%, or at least 85%, or at least 90%, or at least 95%, or at least 98%specific for the sRNA (e.g., iso-miR) over an annotated sequence and/orother non-predictive iso-miRs. Annotated sequences can be determinedwith reference to miRBase. For example, in preparing sRNApredictor-specific real-time PCR assays, PCR primers and fluorescentprobes can be prepared and tested for their level of specificity.Bicyclic nucleotides (e.g., LNA, cET, and MOE) or other nucleotidemodifications (including base modifications) can be employed in probesto increase the sensitivity or specificity of detection.

In some embodiments, the invention provides a kit comprising a panel offrom 2 to about 100 sRNA predictor assays, or from about 2 to about 75sRNA predictor assay, or from 2 to about 40 sRNA predictor assays, orfrom 2 to about 30, or from 2 to about 20, or from 2 to about 10 sRNApredictor assays. In these embodiments, the kit may comprise at least 5,at least 10, at least 20 sRNA predictor assays (e.g., reagents for suchassays). For example, the kit may comprise at least one positivepredictor and at least one negative predictor. In various embodiments,the kit comprises at least 5 positive predictors and at least 2 negativepredictors. In some embodiments, the kit comprises a panel of from 4 toabout 20, or from 4 to about 15, or from 4 to about 10 sRNA predictorassays. Such assays may comprise reverse transcription (RT) primers,amplification primers and probes (such as fluorescent probes or duallabeled probes) specific for the sRNA predictors over annotatedsequences as well as other (non-predictive) 5′- and/or 3′-templatedand/or non-templated variations. In some embodiments, the kit is in theform of an array or other substrate containing probes for detection ofsRNA predictors by hybridization.

In other aspects, the invention provides a method for determining acondition of a cell or organism (including with respect to animals,plants, and microbes). In some embodiments, the invention provides amethod for evaluating the condition of an subject or patient. In someembodiments, the method comprises obtaining a biological sample (such asa biological fluid sample from a subject or patient), and identifyingthe presence or absence of one or more sRNA predictors (identifiedaccording to the method described above), thereby determining thecondition of the cell or organism (e.g., the condition of the patient).For example, the condition identified is the condition that defines theexperimental cohort, with respect to the comparator cohort. In someembodiments, the sRNA predictor(s) are identified in a subject orpatient sample by a detection technology that involves amplificationand/or probe hybridization, such as RT-PCR or TAQMAN assays, or otherdetection formats.

In various embodiments, the sample is a biological fluid sample from apatient, and is selected from blood, serum, plasma, urine, saliva, orcerebrospinal fluid. For example, the sample may be a blood sample orsamples derived therefrom. In some embodiments, at least two biologicalsamples are tested, which may be selected from blood, serum, plasma,urine, saliva, and cerebrospinal fluid.

In various embodiments, the patient is suspected of having aneurodegenerative disease, a cardiovascular disease, an inflammatoryand/or immunological disease, or a cancer. For example, the patient maybe displaying one or more symptoms thereof.

In some embodiments, the patient is suspected of having aneurodegenerative disease selected from Amyotrophic Lateral Sclerosis(ALS), Parkinson's Disease (PD), Alzheimer's Disease (AD), Huntington'sDisease (HD), or Multiple Sclerosis (MS). In some embodiments, thepatient has signs of dementia or a movement disorder, or CNS lesions.

In some embodiments, the patient has or is suspected of having or is atrisk of a cardiovascular disease (CVD) optionally selected from coronaryartery disease (CAD) such as angina and myocardial infarction, stroke,congestive heart failure, hypertensive heart disease, rheumatic heartdisease, cardiomyopathy, heart arrhythmia, congenital heart disease,valvular heart disease, carditis, aortic aneurysms, peripheral arterydisease, and venous thrombosis. In some embodiments, the patient has ahigh risk score for heart attack or stroke.

In some embodiments, the patient displays symptoms of an immune orinflammatory disorder, such as Lupus (SLE), Scleroderma, Vasculitis,Diabetes mellitus (e.g., Type 1 or Type 2), Graves' Disease, RheumatoidArthritis, Multiple Sclerosis, Fibromyalgia, Psoriasis, Crohn's Disease,Celiac Disease, COPD, or pulmonary fibrosis (e.g., IPF). In someembodiments, the condition is an inflammatory condition, which maymanifest as type I hypersensitivity, type II hypersensitivity, type IIIhypersensitivity, and/or type IV hypersensitivity.

In some embodiments, the patient has cancer, is suspected of havingcancer, or is being screened for cancer. The cancer may be bowel cancer,lung cancer, skin cancer, ovarian cancer, breast cancer among others. Insome embodiments, the cancer is stage I or stage II cancer. In otherembodiments, the cancer is stage III or stage IV. In some embodiments,the patient is a candidate for treatment with a checkpoint inhibitor orCAR-T therapy, chemotherapy, neoadjuvant therapy, or radiation therapy.

Illustrative cancers include, but are not limited to, basal cellcarcinoma, biliary tract cancer; bladder cancer; bone cancer; brain andcentral nervous system cancer; breast cancer; cancer of the peritoneum;cervical cancer; choriocarcinoma; colon and rectum cancer; connectivetissue cancer; cancer of the digestive system; endometrial cancer;esophageal cancer; eye cancer; cancer of the head and neck; gastriccancer (including gastrointestinal cancer); glioblastoma; hepaticcarcinoma; hepatoma; intra-epithelial neoplasm; kidney or renal cancer;larynx cancer; leukemia; liver cancer; lung cancer (e.g., small-celllung cancer, non-small cell lung cancer, adenocarcinoma of the lung, andsquamous carcinoma of the lung); melanoma; myeloma; neuroblastoma; oralcavity cancer (lip, tongue, mouth, and pharynx); ovarian cancer;pancreatic cancer; prostate cancer; retinoblastoma; rhabdomyosarcoma;rectal cancer; cancer of the respiratory system; salivary glandcarcinoma; sarcoma; skin cancer; squamous cell cancer; stomach cancer;testicular cancer; thyroid cancer; uterine or endometrial cancer; cancerof the urinary system; vulval cancer; lymphoma including Hodgkin's andnon-Hodgkin's lymphoma, as well as B-cell lymphoma (including lowgrade/follicular non-Hodgkin's lymphoma (NHL); small lymphocytic (SL)NHL; intermediate grade/follicular NHL; intermediate grade diffuse NHL;high grade immunoblastic NHL; high grade lymphoblastic NHL; high gradesmall non-cleaved cell NHL; bulky disease NHL; mantle cell lymphoma;AIDS-related lymphoma; and Waldenstrom's Macroglobulinemia; chroniclymphocytic leukemia (CLL); acute lymphoblastic leukemia (ALL); Hairycell leukemia; chronic myeloblastic leukemia; as well as othercarcinomas and sarcomas; and post-transplant lymphoproliferativedisorder (PTLD), as well as abnormal vascular proliferation associatedwith phakomatoses, edema (e.g. that associated with brain tumors), andMeigs' syndrome.

In some embodiments, the sample is tested for the presence or absence ofat least about 2, or at least about 5, or at least about 10, or at leastabout 20, or at least about 30, or at least about 40, or at least about50 sRNA predictors (e.g., from 4 to 50 sRNA predictors), where thepresence of from 1 to about 10 positive predictors (or from 1 to 5 sRNApositive predictors) is indicative of the condition. Optionally, theabsence of from 1 to 10 negative predictors is further indicative of thecondition. In some embodiments, the presence of positive predictors inthe panel, and the absence of negative predictors in the panel is scoredto determine a probability that the patient has the condition ofinterest.

Patients that test positive for the condition of interest, can then befurther diagnosed and/or treated accordingly for the defined condition.

In other aspects of the invention, positive and/or negative predictorscan be employed to classify a mixed population of cells in vivo or exvivo, through targeted expression of a gene with a detectable orbiological impact. For example, a desired protein can be expressed froma gene construct (using a vector such as a plasmid or viral vector) orexpressed from mRNA delivered to cells in vivo or ex vivo. In theseembodiments, the gene is delivered under the regulatory control oftarget site(s) for the one or more small RNA predictors. The targetsite(s) (target sites for hybridization with the predictors) can beplaced in non-coding segments, such as the 3′ and/or 5′ UTRs, such thatthe encoded protein is only expressed in biologically significantamounts when the desired predictor(s) are absent in the cell. Theprotein encoded by the construct may be a reporter protein, atranscriptional activator, a transcriptional repressor, a pro-apoptoticprotein, a pro-survival protein, a lytic protein, an enzyme, a cytokine,a toxin, or a cell-surface receptor.

For example, the encoded protein can be a fluorescent protein or anenzyme capable of performing a detectable reaction (e.g.,β-galactosidase, alkaline phosphatase, luciferase, or horseradishperoxidase). In these embodiments, all cells expressing the positive ornegative predictor will be differentiated from other cells, allowing asub-population of cells to be accurately identified ex vivo or in vivo.In some embodiments, the genetic constructs enable the identification ofspecific cell populations for isolation, such as a desired immune celltype or cells with a desired stem cell phenotype, e.g., by fluorescentcell sorting. In vivo, such detectable constructs can also be useful intreatment of cancer, by, for example, aiding in precise surgical removalof the cancer or targeted radiation or chemotherapy.

In some embodiments, the encoded protein can modulate a cellular pathwayor activity of the cell. For example, the alteration in cellularactivity can cause or alter apoptotic cell death, replication (e.g., DNAor cellular replication), cell differentiation, or cell migration. Forexample, apoptosis can be the result of the expression of a deathreceptor (e.g., FasR or TNFR), death receptor ligand (e.g., FasL orTNF), a caspase (e.g., caspase 3 or caspase 9), cytochrome-c, aBH3-containing proapoptotic protein (e.g., BAX, BAD, BID, or BIM),apoptosis inducing factor (AIF), or a protein toxin. Alternatively,growth arrest can be the result of expression of a protein such as p21,p19ARF, p53, or RB protein, or tumor suppressor protein. In someembodiments, the encoded protein is a growth factor or cytokine, eitheran inflammatory or anti-inflammatory cytokine.

In some embodiments, the genetic construct (whether DNA or RNA) isadministered to a subject having cancer, an immunological disorder suchas an autoimmune diseases, a neurodegenerative disorder, acardiovascular disorder, a metabolic disorder, or an infection(bacterial, viral, or parasitic infection). Administration of thegenetic construct targets individual cells with precision based oninternal molecular cues (presence or absence of one or more predictors).

In some embodiments, the construct contains a target site specific for anegative sRNA predictor to avoid expression of the encoded protein innon-diseased cells (where the negative predictor will be present). Insome embodiments, the encoded protein induces cell death or apoptosis incells that do not express the negative predictor. In some embodiments,the protein is a toxin or protein that induces apoptosis or cell death.

In other embodiments, the construct contains a target site specific fora positive sRNA predictor to avoid expression of the encoded protein indiseased cells. For example, the encoded protein may protect the cellsfrom insult (e.g., a pro-survival protein), such as an insult in theform of chemotherapy, radiation, or immuno-oncology. In theseembodiments, the encoded protein may be under the regulatory control ofa target site for a small RNA predictor only present in diseased cells(positive predictor). In these embodiments, the construct would beexpressed and limit damage and toxicity in non-diseased cells.

Other aspects and embodiments of the invention will be apparent from thefollowing examples.

EXAMPLES

The conventional approach to miRNA sequence analysis for diagnostic useinvolves identifying up- or down-regulated miRNAs, typically withreference to an annotated sequence. For data processing and analysis,the goal is to identify dysregulated miRNAs (up or down-regulated) forvalidation in larger cohorts using targeted assays such as TAQMAN-basedqPCR.

For example, a small RNA fraction is extracted/isolated from samples, 3′and 5′ adapters are ligated to sRNAs, and sRNAs are reverse transcribed,amplified, and sequenced. During processing, adapter sequences aretrimmed (typically using a Smith-Waterman Algorithm or close derivativethereof), and reads are aligned to a reference sequence. Residualsequences are sometimes analyzed by predictive programs to identify newmiRNAs. Read numbers are quantified for each reference miRNA. See FIG.1A illustrating the conventional approach. Current data analysis methodsanalyze fold-changes between samples (FIG. 1B). Typically, deltas arearound 1.8 to 5-fold, which is insufficient for a meaningful diagnostictest.

Furthermore, the term miRNA is a misnomer. For any given miRNA there aremultiple iso-miRs that harbor templated and/or non-templated nucleotidesat the 5′- and/or 3′-end (see FIG. 2 and FIG. 3). The conventionalmethod for analyzing miRNA sequence data ‘masks’ iso-miR data, sincetrimmed sequence reads are aligned back to a reference list of miRNAsequences (e.g. a comprehensive list of all cloned miRNAs, from whateverspecies the research is being performed in), typically sourced frommiRBase, a miRNA sequence depository). Further, TAQMAN assays used indown-stream validation are highly-specific for the sequences they aredesigned to detected, and they are designed against the same referencelist of miRNAs from miRBase. Thus, these TAQMAN assays only detectannotated miRNAs, and not closely related sequence variants of theannotated miRNA, including iso-miRs. See, Chen C, et al., Real-timequantification of microRNAs by stem-loop RT-PCR, Nucleic Acids Res.2005, 33(20) e179. Also, see FIG. 5 showing specificity of TAQMAN assaysagainst closely related variants.

In embodiments of the process described herein, raw sequencing data istrimmed by identifying and removing the 3′ adapter sequences. The 3′adapter sequence to be trimmed is user-specified, and thus RNAsequencing data generated from any RNA-sequence platform can be used.For example, the software can employ ‘pattern matching’ to identifyregular expressions (i.e. the user-specified 3′ adapter), and if desireda defined level of variation to the user-specified 3′ adapter, and thendeletes them. In this approach there is no ‘fuzzy trimming’, as is seenwith a Smith-Waterman Algorithm, because here only regular expressions,and if desired, the level of user-specified variation to the regularexpression, is trimmed. Further differentiation from a Smith-WatermanAlgorithm, the 5′ most nucleotide (i.e. the nucleotide that defines thejunction between the small RNA and the 3′ adapter) must be present in aread in order for the regular expression to be recognized by thesoftware program and trimmed. Embodiments of the software accommodate upto: 5 wild cards, 1 insertion, 2 deletions, 1 insertion+1 wild card, and1 deletion+1 wild card. The program can trim nearly 100% of the sequencedata, whereas most programs only trim around 80 to 85%. Trimmed sequencedata is not aligned to a reference, thereby retaining the individualiso-miR data, as well as many other small RNA families that wouldotherwise be eliminated, such as: miRNAs not listed in the reference,Piwi-interacting RNA (piRNA), small interfering RNA (siRNA), vault RNA(vtRNA), small nucleolar RNA (snoRNA), transfer RNA-derived small RNA(tsRNA), ribosomal RNA-derived small RNA fragments (rsRNA), smallrDNA-derived RNA (srRNA), and small nuclear RNA (U-RNA).

Data is sorted based on individual sequence reads, and each sequenceread is condensed to a single line and quantified. Using thecondensed/quantified data, the process uses a program to look for‘unique’ or ‘binary’ RNA sequences that are only present in the cohortof interest. For example, to identify positive predictors, the sequenceread content of Group B (i.e. the comparator cohort) is compiled into adictionary, and the sequence read content of each sample in Group A(i.e. the experimental cohort) is compared against the dictionary andthe following equation is executed: Group A-Group B. Positive predictors(i.e. unique/binary reads) found in cohort A are output to a new fileand quantified. To identify negative predictors, the sequence readcontent of Group A (i.e. the experimental cohort) is compiled into adictionary, and the sequence read content of each sample in Group B(i.e. the comparator cohort) is compared against the dictionary and thefollowing equation is executed: Group B-Group A. Negative predictors(i.e. unique/binary reads) found in cohort B are output to a new fileand quantified. When identifying positive predictors and negativepredictors, sequences found in both B and A are discarded. That is, theonly data that conventional methods use, is discarded in accordance withembodiments of the present disclosure. If positive and/or negativepredictors are present in >1 sample, data for each sample may becompiled in the same output file, and total read count across all thesamples is calculated. Read frequency (% of samples with which aparticular binary sequence occurred) is also calculated. Since thesequences being identified are 100% unique to a particular Group orCohort, they are ‘perfect predictors’.

Once binary predictors are identified, stem-loop-RT based TAQMAN qPCRassays may be designed against any of the sequences of interest.Stem-loop-RT based TAQMAN qPCR assays are ultra-specific and give singlenucleotide resolution (FIG. 5). Where assays do not give 100%specificity, introduction of chemical modifications into thestem-loop-RT primer and/or qPCR primers, and/or TAQMAN probe canincrease the base-pairing specificity and/or increase the meltingtemperature (Tm) of annealing. Stem-loop-RT-based TAQMAN qPCR assays candetect as few as 7 copies of a small RNA in a sample.

Example 1: Huntington's Disease

Small RNA sequencing data from GSE64977 was obtained from the GEODatabase. Hoss AG, et al., miR-10b-5p expression in Huntington's diseasebrain relates to age of onset and the extent of striatal involvement.BMC Med Genomics, 2015, Mar. 1; 8:10.

Sequence Read Archive (.sra) files were converted to .fastq format usingthe SRA Toolkit v2.8.0. 1). Raw small RNA sequencing data was trimmedusing the methods described with the following adapter sequence:TGGAATTCTCGGGTGCCAAGGAACTC (SEQ ID NO:1). Resulting biomarkers had to beequal to or greater than twenty nucleotides after trimming to beconsidered for downstream analysis.

Positive and negative predictors were identified by comparing (28)Huntington's Disease samples to (36) healthy control samples. Biomarkershad to be equal to or greater than twenty nucleotides, and had to occurat a frequency of equal to or greater than 10% of the population to beconsidered.

The top 335 highest frequency small RNAs found in Huntington's Disease,healthy controls, and both Huntington's Disease and healthy controlswere clustered using Ward's agglomerative clustering with incompletelinkage (FIG. 6).

Eight positive small RNA predictors (only found in Huntington's Diseasepatients) were selected for experimental validation. Reversetranscription (RT) hairpin-based TaqMan quantitative polymerase chainreaction (qPCR) assays (ThermoFisher Scientific) were designed tospecifically target those small RNAs.

Total RNA was extracted from the frontal cortex (region BA9) of 32healthy control and 32 Huntington's Disease patients that werepostmortem verified for pathology and disease-grade, using the miRNeasyPurification Kit from Qiagen (Catalog Number: 217004). cDNA librarieswere multiplex-reverse transcribed from 1000 ng of total RNA using theTaqMan MicroRNA Reverse Transcription Kit (ThermoFisher Scientific,Catalog Number: 4366596) and pooled RT primers, according to themanufacturer's protocol. Resultant cDNA libraries were diluted 1:500with 10 mM Tris pH 8.0 (Millipore, Catalog Number: 648314).

Small RNA predictors were analyzed from 2ul of cDNA in triplicate, byTaqMan qPCR using targeted primers and probes, and Universal Master MixII (ThermoFisher Scientific, Catalog Number: 4440043), in a 5ulreaction, thermocycled 50-times, in an ABI 7900HT Fast Real-Time PCRSystem fitted with a 384-well heat block.

The following acceptance criteria was applied step-wise to the raw CycleThreshold (Ct) values:

(1) Ct values over 39.999999 were excluded from analysis,

(2) samples must have a minimum of 2 duplicates to be considered foranalysis,

(3) the coefficient of variance (% CV) must be less than 5%; 1triplicate was allowed to be masked to meet the % CV acceptance criteria(samples with only 2 duplicates could not be masked).

Clinical information (disease vs non-disease, and disease grade) wasunmasked and the samples were decoded and Ct values were plotted forhealthy controls and Huntington's Disease (FIG. 7). Eight biomarkerswere analyzed for a correlation of Ct to disease grade using Box-Whiskerplots. Ct values of three biomarkers named Huntington's DiseaseBiomarker-4 (HDB-4), HDB-5, HDB-7 correlated with disease grade byAnalysis of Variance (ANOVA) (FIG. 8).

Example 2: Parkinson's Disease

Small RNA sequencing data from GSE72962 and GSE64977 was obtained fromthe GEO Database. Hoss AG, et al., microRNA Profiles in Parkinson'sDisease Prefrontal Cortex, Front Aging Neurosci. 2016, Mar. 1; 8:36.

Small RNA sequencing data from phs000727.v1.p1 was obtained from thedbGAP Database. Sequence Read Archive (.sra) files were converted to.fastq format using the SRA Toolkit v2.8.0. Raw small RNA sequencingdata was trimmed using the methods described with the following adaptersequence: TGGAATTCTCGGGTGCCAAGGAACTC (SEQ ID NO:1). Resulting biomarkershad to be equal to or greater than twenty nucleotides after trimming tobe considered for downstream analysis.

To identify positive and negative binary predictors in frontal cortex(region BA9), 29 Parkinson's samples were compared to 36 healthy controlsamples. Biomarkers had to be equal to or greater than twentynucleotides, and had to occur at a frequency of equal to or greater than10% of the population to be considered.

To identify positive and negative binary predictors in cerebrospinalfluid, 66 Parkinson's samples and 68 healthy controls were compared.Biomarkers had to be equal to or greater than twenty nucleotides, andhad to occur at a frequency of equal to or greater than 10% of thepopulation to be considered.

To identify positive and negative binary predictors in serum, 60Parkinson's samples and 70 healthy controls were compared. Biomarkershad to be equal to or greater than twenty nucleotides, and had to occurat a frequency of equal to or greater than 10% of the population to beconsidered.

The top 335 highest frequency small RNAs found in Parkinson's Disease,healthy controls, and both Parkinson's Disease and healthy controls wereclustered using Ward's agglomerative clustering with incomplete linkage(FIG. 9). Tissue-specific biomarker overlap was determined; onlybiomarkers having a frequency of greater than 10% were considered foranalysis (FIG. 10). As shown in FIG. 10, sRNA predictors can be found inmultiple tissues and biological fluids including serum, and thus can bedeveloped as convenient markers for neurodegenerative diseases such asPD.

Example 3: Alzheimer's Disease

Small RNA sequencing data from GSE46579 was obtained from the GEODatabase. Burgos K, et al., Profiles of extracellular miRNA incerebrospinal fluid and serum from patients with Alzheimer's andParkinson's diseases correlate with disease status and features ofpathology, 2014 May 5;9(5):e94839; Leidinger P, et al., A blood based12-miRNA signature of Alzheimer disease patients PLoS One (2014); GenomeBiol. 2013 Jul. 29; 14(7):R78.

Small RNA sequencing data from phs000727.v1.p1 was obtained from thedbGAP Database. Sequence Read Archive (.sra) files were converted to.fastq format using the SRA Toolkit v2.8.0. Raw small RNA sequencingdata was trimmed using the methods described with the following adaptersequence: TGGAATTCTCGGGTGCCAAGGAACTC (SEQ ID NO:1). Resulting biomarkershad to be equal to or greater than twenty nucleotides after trimming tobe considered for downstream analysis.

To identify positive and negative binary predictors in cerebrospinalfluid, 67 Alzheimer's samples were compared to 68 healthy controls.Biomarkers had to be equal to or greater than twenty nucleotides, andhad to occur at a frequency of equal to or greater than 10% of thepopulation to be considered.

To identify positive and negative binary predictors in serum, 62Alzheimer's samples were compared to 70 healthy controls. Biomarkers hadto be equal to or greater than twenty nucleotides, and had to occur at afrequency of equal to or greater than 10% of the population to beconsidered.

To identify positive and negative binary predictors in PAXgene (wholeblood), 48 Alzheimer's samples were compared to 22 healthy controlsamples. Biomarkers had to be equal to or greater than twentynucleotides, and had to occur at a frequency of equal to or greater than10% of the population to be considered.

The top 335 highest frequency small RNAs found in Alzheimer's Disease,healthy controls, and both Alzheimer's Disease and healthy controls wereclustered using Ward's agglomerative clustering with incomplete linkage(FIG. 11). Tissue-specific biomarker overlap was determined; onlybiomarkers having a frequency of greater than 10% were considered foranalysis (FIG. 12). As shown in FIG. 12, predictors are found inmultiple tissues and biological fluids.

Example 4: Breast Cancer

Small RNA sequencing data from GSE29173 was obtained from the GEODatabase. Farazi TA, et al., MicroRNA sequence and expression analysisin breast tumors by deep sequencing, Cancer Res. 2011 Jul.1;71(13):4443-53.

Sequence Read Archive (.sra) files were converted to .fastq format usingthe SRA Toolkit v2.8.0. Raw small RNA sequencing data was trimmed usingthe methods described with the following adapter sequence:TGGAATTCTCGGGTGCCAAGGAACTC (SEQ ID NO:1), followed by subsequenttrimming of a 5-mer barcode on each sequence read. Resulting biomarkershad to be equal to or greater than twenty nucleotides after trimming tobe considered for downstream analysis.

To identify positive and negative binary predictors in breast cancertissue, 229 breast cancer samples were compared to 16 healthy controls.Biomarkers had to be equal to or greater than twenty nucleotides, andhad to occur at a frequency of equal to or greater than 10% of thepopulation to be considered. The top 335 highest frequency small RNAsfound in breast cancer, healthy controls, and both breast cancer andhealthy controls were clustered using Ward's agglomerative clusteringwith incomplete linkage (FIG. 13).

1. A method for identifying small RNA (sRNA) predictors, comprising:identifying one or more sRNA sequences that are present in one or morebiological samples in an experimental cohort, and which are not presentin samples of a comparator cohort, thereby identifying a positive sRNApredictor.
 2. The method of claim 1, further comprising identifying oneor more sRNA sequences that are present in one or more samples in acomparator cohort, and which are not present in samples of anexperimental cohort, thereby identifying a negative sRNA predictor. 3.The method of claim 1 or 2, wherein the one or more sRNA sequences areidentified using RNA sequencing data for the experimental and comparatorcohorts.
 4. The method of any one of claims 1 to 3 further comprising,detecting the sRNA predictor(s) in independent experimental and/orcomparator samples.
 5. The method of claim 4, wherein the sRNApredictor(s) are detected in an independent cohort using a quantitativeor qualitative PCR assay.
 6. The method of any one of claims 1 to 5,wherein the biological samples are solid tissue, biological fluid, orcultured cells.
 7. The method of claim 6, wherein the biological sampleis a sample from animal, plant, or microbe.
 8. The method of claim 6,wherein the biological samples are biological fluid samples selectedfrom blood, serum, plasma, urine, saliva, or cerebrospinal fluid.
 9. Themethod of any one of claims 1 to 8, wherein the experimental cohort andthe comparator cohort each have at least 10 samples.
 10. The method ofclaim 9, wherein the experimental cohort and the comparator cohort eachhave at least 100 samples.
 11. The method of any one of claims 1 to 10,wherein the experimental cohort comprises samples from patientsdiagnosed as having a neurodegenerative disease, a cardiovasculardisease, an inflammatory or immunological disease, or a cancer.
 12. Themethod of claim 11, wherein the patients in the experimental cohort arediagnosed as having a neurodegenerative disease selected fromAlzheimer's Disease, Parkinson's Disease, Amyotrophic Lateral Sclerosis,Huntington's Disease, or Multiple Sclerosis.
 13. The method of any oneof claims 1 to 12, wherein the positive sRNA predictor(s) are identifiedby quantifying the number of reads for each unique sRNA sequence in eachsample of the experimental cohort; and the negative sRNA predictor(s)are identified by quantifying the number of reads for each unique sRNAsequence in each sample of the comparator cohort.
 14. The method ofclaim 13, wherein a user-defined 3′ sequencing adaptor is trimmed fromthe sequence reads.
 15. The method of claim 14, wherein the followingregular expressions of the 3′ sequencing adaptor are deleted: a. adaptorsequence b. adaptor sequence permitting 1 wild-card c. adaptor sequencepermitting 1 insertion d. adaptor sequence permitting 1 deletion e.adaptor sequence permitting 2 deletions f. adaptor sequence permitting 1deletion and 1 wild-card g. adaptor sequence permitting 1 insertion and1 wild-card h. adaptor sequence permitting 2 wild-cards i. adaptorsequence permitting 3 wild-cards j. adaptor sequence permitting 4 wildcards. wherein: a wild-card is defined as being any 1 of the 4deoxyribonucleic acids: (A) adenine, (T) thymine, (G) guanine, or (C)cytosine; the first nucleotide at the 5′ end of the 3′ adaptor sequenceis not inserted, deleted, or subject to wild-card change, with theproviso that if the first nucleotide of the 3′ adaptor is not present,the sequence is not trimmed.
 16. The method of any one of claims 13 to15, wherein the sequence reads from the experimental cohort and thecomparator cohort are compiled, and compared; and where sequence readsthat are in both cohorts are discarded, and sequence reads that areunique to the experimental cohort or the comparator cohort are candidatesRNA predictors.
 17. The method of claim 16, wherein an output fileannotates the unique sequences, and annotates the count of the uniquereads for each sample or group of samples in the experimental andcomparator cohorts.
 18. The method of claim 17, wherein sequence readsare not filtered by a quality score.
 19. The method of any one of claims13 to 18, wherein sRNA sequences are not aligned to a referencesequence.
 20. The method of any one of claims 17 to 19, wherein sRNApredictors are selected that have a sequence read count of at least 5 inthe majority of samples that are positive for the predictor.
 21. Themethod of claim 20, wherein the sRNA predictors are selected that have acount of at least 50 in the majority of samples that are positive forthe predictor.
 22. The method of claim 20 or 21, wherein positive sRNApredictors are selected that are present in at least 7% of samples inthe experimental cohort.
 23. The method of claim 20 or 21, whereinpositive sRNA predictors are selected that are present in at least 20%of samples in the experimental cohort.
 24. The method of claim 22 or 23,wherein from 2 to 50 sRNA predictors are selected for inclusion in ansRNA predictor panel.
 25. The method of claim 24, wherein from 4 to 20sRNA predictors are selected for inclusion in an sRNA predictor panel.26. The method of claim 24 or 25, wherein the presence of from 1 to 5 ofthe positive sRNA predictors in a sample, and optionally the absence ofall of the 1 to 10 negative predictors in the sample, is indicative ofthe condition defined by the experimental cohort.
 27. The method of anyone of claims 24 to 26, wherein the sRNA predictors in the panel are notannotated miRNAs.
 28. The method of any one of claims 24 to 27, furthercomprising, preparing a qualitative or quantitative PCR assay to detectthe sRNA predictors in the panel in independent samples.
 29. A kitcomprising a set of PCR primers and detectable probes for specificdetection by PCR of the sRNA predictor panel identified in any one ofclaims 24 to
 27. 30. The kit of claim 29, wherein the probes comprise afluorophore.
 31. The kit of claim 30, wherein the probes comprise aquencher.
 32. The kit of any one of claims 29 to 31, wherein the kitfurther comprises a stem-loop RT primer for amplification of the sRNApredictors.
 33. A method for determining a condition of a subject,comprising: providing a biological sample, and identifying the presenceor absence of the sRNA predictor(s) identified according to the methodof any one of claims 1 to 27, or by use of the kit of any one of claims28 to 32, thereby determining the condition of the subject.
 34. Themethod of claim 33, wherein the sample is a biological fluid sample. 35.The method of claim 34, wherein the biological fluid samples areselected from blood, serum, plasma, urine, saliva, or cerebrospinalfluid.
 36. The method of any one of claims 33 to 35, wherein thecondition is defined by the experimental cohort.
 37. The method of anyone of claims 33 to 36, wherein the subject is positive for thecondition where the sample tests positive for one or more positivepredictors, and negative for all negative predictors.
 38. The method ofany one of claims 33 to 37, wherein the patient is suspected of havingor exhibits symptoms of a neurodegenerative disease, a cardiovasculardisease, an inflammatory or immunological disease, or a cancer.
 39. Themethod of claim 38, wherein the patient displays dementia or movementdisorder.
 40. The method of claim 39, wherein the patient is suspectedof having or exhibits symptoms of a neurodegenerative disease selectedfrom Alzheimer's Disease, Parkinson's Disease, Amyotrophic LateralSclerosis, Huntington's Disease, and Multiple Sclerosis.
 41. The methodof any one of claims 33 to 40, wherein the sRNA predictor(s) areidentified in the biological sample by qualitative or quantitative PCRassay.
 42. The method of claim 41, wherein the PCR assay involves afluorescently-labeled probe.
 43. A method for classifying a mixedpopulation of cells, comprising introducing a gene construct to thecells, the gene construct comprising an encoded protein under theregulatory control of a target site specific for a positive or negativesRNA predictor.
 44. The method of claim 43, wherein the gene constructis introduced to the cells in vivo or ex vivo.
 45. The method of claim43 or 44, wherein the gene construct is a plasmid or a viral vector. 46.The method of claim 43 or 44, wherein the gene construct is an mRNA. 47.The method of any one of claims 43 to 46, wherein the target site(s) areplaced in non-coding segments.
 48. The method of claim 47, wherein thenon-coding segment is a 3′ and/or 5′ UTR.
 49. The method of claim 48,wherein the encoded protein is only expressed in biologicallysignificant amounts when the sRNA predictor is absent from the cell. 50.The method of any one of claims 43 to 49, wherein the encoded protein isdetectable or has a biological impact on the cell.
 51. The method ofclaim 50, wherein the encoded protein is a reporter protein, atranscriptional activator, a transcriptional repressor, a pro-apoptoticprotein, a pro-survival protein, a lytic protein, an enzyme, a cytokine,a growth factor, a toxin, or a cell-surface receptor.
 52. The method ofany one of claims 43 to 51, wherein the construct contains a target sitespecific for a negative sRNA predictor to avoid expression of theencoded protein in non-diseased cells, wherein the encoded proteinoptionally induces cell death or apoptosis in cells that do not expressthe negative predictor.
 53. The method of any one of claims 43 to 51,wherein the construct contains a target site specific for a positivesRNA predictor to avoid expression of the encoded protein in diseasedcells, wherein the encoded protein optionally protects cells from insultthat do not express the positive predictor.